Fully distributed data mining

ثبت نشده
چکیده

One of the group's main activities involves distributed data mining. The GOLF framework developed earlier was extended with the capability to adapt to dynamically changing environments and concept drift. The basic idea behind this method is to maintain a fixed age distribution in the network among the models. Young models help in adapting to sudden changes, while older models represent stability. The proposed method targets the maintenance of the age distribution and makes use of age during model propagation. The group members also invented new distributed solutions for bandit algorithms. The model they proposed has identical bandits at each point in the network. The goal here is to allow the peers to exchange as much information as possible about the arms with as little communication as possible. The group's solution achieves a linear speedup in the number of network nodes using a gossip protocol. An additional result is related to storing graph structured databases in an optimal way over multiple sites. Graphs should ideally be stored in such a way that the number of edges between any two sites is minimal, which leads to a partitioning problem. The research group proposed a fully distributed simulated annealing heuristics to tackle the problem, and demonstrated its favorable performance over several real-world networks. The paper describing this research won the best paper award at IEEE SASO 2013. The team's researchers also studied peer sampling algorithms, which are key components of peer-to-peer systems. They focused on environments where creating new connections is expensive, and where NAT-ed nodes are present, and proposed a solution in which a dynamically, but very slowly changing network is maintained, over which short random walks are performed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Privacy Preserving Frequency Mining in 2-Part Fully Distributed Setting

Recently, privacy preservation has become one of the key issues in data mining. In many data mining applications, computing frequencies of values or tuples of values in a data set is a fundamental operation repeatedly used. Within the context of privacy preserving data mining, several privacy preserving frequency mining solutions have been proposed. These solutions are crucial steps in many pri...

متن کامل

Privacy Preserving CART Algorithm over Vertically Partitioned Data

Data mining classification algorithms are centralized algorithm and works on centralized database. In this information age, organizations uses distributed database. Since data mining of private data is one of the keys to success for an organization, it is a challenging task to implement data mining in distributed database. Collaboration of different organization brings mutual benefits to the pa...

متن کامل

Asynchronous Peer-to-Peer Data Mining with Stochastic Gradient Descent

Fully distributed data mining algorithms build global models over large amounts of data distributed over a large number of peers in a network, without moving the data itself. In the area of peer-to-peer (P2P) networks, such algorithms have various applications in P2P social networking, and also in trackerless BitTorrent communities. The difficulty of the problem involves realizing good quality ...

متن کامل

A Fully Distributed Framework for Cost-Sensitive Data Mining

Data mining systems aim to discover patterns and extract useful information from facts recorded in databases. A widely adopted approach is to apply machine learning algorithms to compute descriptive models or classifiers from the available data. Two of the main challenges in this area are that i) databases are large and possibly physically distributed, and ii) data are cost-sensitive, or exampl...

متن کامل

Towards Data Mining in Large and Fully Distributed Peer-to-Peer Overlay Networks

The Internet, which is becoming a more and more dynamic, extremely heterogeneous network has recently became a platform for huge fully distributed peer-to-peer overlay networks containing millions of nodes typically for the purpose of information dissemination and file sharing. This paper targets the problem of analyzing data which are scattered over a such huge and dynamic set of nodes, where ...

متن کامل

A Survey on Big Data, Data Mining and Overlay Based Parallel Data Mining

The main goal of the data mining process is to extract useful information from Big Data set and transform it into an understandable form for further use. It was not possible to extract useful information from the large datasets or data streams. Now this can be achieved by the capability of Big Data Mining. The overlaybased parallel data mining architecture executes processing by employing the o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016